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Abstract. It is commonly assumed that the ability to track the fre- 
quencies of a set of schemata in the evolving population of an infinite 
population genetic algorithm (IPGA) under different fitness functions 
will advance efforts to obtain a theory of adaptation for the simple GA. 
Unfortunately, for IPGAs with long genomes and non-trivial fitness func- 
tions there do not currently exist theoretical results that allow such a 
study. We develop a simple framework for analyzing the dynamics of 
an infinite population evolutionary algorithm (IPEA). This framework 
derives its simplicity from its abstract nature. In particular we make no 
commitment to the data-structure of the genomes, the kind of variation 
performed, or the number of parents involved in a variation operation. 
We use this framework to derive abstract conditions under which the 
dynamics of an IPEA can be coarse-grained. We then use this result to 
derive concrete conditions under which it becomes computationally fea- 
sible to closely approximate the frequencies of a family of schemata of 
relatively low order over multiple generations, even when the bitstsrings 
in the evolving population of the IPGA are long. 



1 Introduction 

It is commonly assumed that theoretical results which allow one to track the 
frequencies of schemata in an evolving population of an infinite population ge- 
netic algorithm (IPGA) under different fitness functions will lead to a better 
understanding of how GAs perform adaptation 7 6 8J. An IPGA with genomes 
of length £ can be modelled by a set of 2 l coupled difference equations. For each 
genome in the search space there is a corresponding state variable which gives 
the frequency of the genome in the population, and a corresponding difference 
equation which describes how the value of that state variable in some gener- 
ation can be calculated from the values of the state variables in the previous 
generation. A naive way to calculate the frequency of some schema over multiple 
generations is to numerically iterate the IPGA over many generations, and for 
each generation, to sum the frequencies of all the genomes that belong to the 
schema. The simulation of one generation of an IPGA with a genome set of size 
TV has time complexity 0(N 3 ), and an IPGA with bitstring genomes of length I 



has a genome set of size N = 2 . Hence, the time complexity for a numeric simu- 
lation of one generation of an IPGA is 0(8 e ) . (See [TSJ p. 36] for a description of 
how the Fast Walsh Transform can be used to bring this bound down to 0(3 £ ).) 
Even when the Fast Walsh Transform is used, computation time still increases 
exponentially with I. Therefore for large £ the naive way of calculating the fre- 
quencies of schemata over multiple generations clearly becomes computationally 
intractablcEl 

Holland's schema theorem |7I6I8] was the first theoretical result which 
allowed one to calculate (albeit imprecisely) the frequencies of schemata after a 
single generation. The crossover and mutation operators of a GA can be thought 
to destroy some schemata and construct others. Holland only considered the 
destructive effects of these operators. His theorem was therefore an inequality. 
Later work |15j contained a theoretical result which gives exact values for the 
schema frequencies after a single generation. Unfortunately for IPGAs with long 
bitstrings this result does not straightforwardly suggest conditions under which 
schema frequencies can be numerically calculated over multiple generations in a 
computationally tractable way. 

1.1 The Promise of Coarse- Graining 

Coarse-graining is a technique that has widely been used to study aggregate 
properties (e.g. temperature) of many-body systems with very large numbers 
of state variables (e.g. gases). This technique allows one to reduce some sys- 
tem of difference or differential equations with many state variables (called the 
fine-grained system) to a new system of difference or differential equations that 
describes the time-evolution of a smaller set of state variables (the coarse-grained 
system) . The state variables of the fine-grained system are called the microscopic 
variables and those of the coarse-grained system are called the macroscopic vari- 
ables. The reduction is done using a surjective non-injective function between 
the microscopic state space and the macroscopic state space called the partition 
function. States in the microscopic state space that share some key property 
(e.g. energy) are projected to a single state in the macroscopic state space. The 
reduction is therefore 'lossy', i.e. information about the original system is typi- 
cally lost. Metaphorically speaking, just as a stationary light bulb projects the 
shadow of some moving 3D object onto a flat 2D wall, the partition function 
projects the changing state of the fine-grained system onto states in the state 
space of the coarse-grained system. 

The term 'coarse-graining' has been used in the Evolutionary Computa- 
tion literature to describe different sorts of reductions of the equations of an 
IPGA. Therefore we now clarify the sense in which we use this term. In this 
paper a reduction of a system of equations must satisfy three conditions to be 
called a coarse-graining. Firstly, the number of macroscopic variables should be 
smaller than the number of microscopic variables. Secondly, the new system of 

1 Vose reported in 1999 that computational concerns force numeric simulation to be 
limited to cases where £ < 20 



equations must be completely self-contained in the sense that the state- variables 
in the new system of equations must not be dependent on the microscopic vari- 
ables. Thirdly, the dynamics of the new system of equations must 'shadow' the 
dynamics described by the original system of equations in the sense that if the 
projected state of the original system at time t — is equal to the state of the 
new system at time t = then at any other time t, the projected state of the 
original system should be closely approximated by the state of the new system. 
If the approximation is instead an equality then the reduction is said to be an 
exact coarse-graining. Most coarse-grainings are not exact. This specification of 
coarse-graining is consistent with the way this term is typically used in the scien- 
tific literature. It is also similar to the definition of coarse-graining given in |12j 
(the one difference being that in our specification a coarse-graining is assumed 
not to be exact unless otherwise stated). 

Suppose the vector of state variables x™ is the state of some system at 
time t and the vector of state variables y^ is the state of a coarse-grained system 
at time t. Now, if the partition function projects x(°) to y^ **, then, since none of 
the state variables of the original system are needed to express the dynamics of 
the coarse-grained system, one can determine how the state of the coarse-grained 
system yw (the shadow state) changes over time without needing to determine 
how the state in the fine-grained system yS^ (the shadowed state) changes. Thus, 
even though for any t, one might not be able to determine one can always be 
confident that y'*) is its projection. Therefore, if the number of state variables 
of the coarse-grained system is small enough, one can numerically iterate the 
dynamics of the (shadow) state vector without needing to determine the 
dynamics of the (shadowed) state vector x^. 

In this paper we give sufficient conditions under which it is possible to 
coarse-grain the dynamics of an IPGA such that the macroscopic variables are 
the frequencies of the family of schemata in some schema partition. If the size 
of this family is small then, regardless of the length of the genome, one can use 
the coarse-graining result to numerically calculate the approximate frequencies 
of these schemata over multiple generations in a computationally tractable way. 
Given some population of bitstring genomes, the set of frequencies of a family of 
schemata describe the multivariate marginal distribution of the population over 
the defined locii of the schemata. Thus another way to state our contribution is 
that we give sufficient conditions under which the multivariate marginal distri- 
bution of an evolving population over a small number of locii can be numerically 
approximated over multiple generations regardless of the length of the genomes. 

We stress that our use of the term 'coarse-graining' differs from the way 
this term has been used in other publications. For instance in [16] the term 
'coarse-graining' is used to describe a reduction of the IPGA equations such 
that each equation in the new system is similar in form to the equations in the 
original system. The state variables in the new system are defined in terms of the 
state variables in the original system. Therefore a numerical iteration of the the 
new system is only computationally tractable when the length of the genomes 
is relatively short. Elsewhere the term coarse-graining has been defined as "a 



collection of subsets of the search space that covers the search space" [5] , and as 
"just a function from a genotype set to some other set" [I] . 

1.2 Some Previous Coarse- Graining Results 

Techniques from statistical mechanics have been used to coarse-grain GA dy- 
namics in |9ll0lllj (see [TS] for a survey of applications of statistical mechanics 
approaches to GAs) . The macroscopic variables of these coarse-grainings are the 
first few cumulants of the fitness distribution of the evolving population. In |12j 
several exact coarse-graining results are derived for an IPGA whose variation 
operation is limited to mutation. 

Wright et. al. show in [20] that the dynamics of a non-selective IPGA can be 
coarse-grained such that the macroscopic variables are the frequencies of a family 
of schemata in a schema partition. However they argue that the dynamics of a 
regular selecto-mutato-recombinative IPGA cannot be similarly coarse-grained 
"except in the trivial case where fitness is a constant for each schema in a schema 
family" [20 . Let us call this condition schematic fitness invariance. Wright et. 
al. imply that it is so severe that it renders the coarse-graining result essentially 
useless. 

This negative result holds true when there is no constraint on the initial 
population. In this paper we show that if we constrain the class of initial popu- 
lations then it is possible to coarse-grain the dynamics of a regular IPGA under 
a much weaker constraint on the fitness function. The constraint on the class 
of initial populations is not onerous; this class includes the uniform distribution 
over the genome set. 

1.3 Structure of this Paper 

The rest of this paper is organized as follows: in the next section we define the 
basic mathematical objects and notation which we use to model the dynam- 
ics of an infinite population evolutionary algorithm (IPEA). This framework is 
very general; we make no commitment to the data-structure of the genomes, 
the nature of mutation, the nature of recombination , or the number of parents 
involved in a recombination. We do however require that selection be fitness 
proportional. In section 3 we define the concepts of semi-coarsenablity, coarsen- 
ablity and global coarsenablity which allow us to formalize a useful class of exact 
coarse-grainings. In section 4 and section 5 we prove some stepping-stone results 
about selection and variation. We use these results in section 6 where we prove 
that an IPEA that satisfies certain abstract conditions can be coarse-grained. 
The proofs in sections 5 and 6 rely on lemmas which have been relegated to and 
proved in the appendix. In section 7 we specify concrete conditions under which 
IPGAs with long genomes and non-trivial fitness functions can be coarse-grained 
such that the macroscopic variables are schema frequencies and the fidelity of 
the coarse-graining is likely to be high. We conclude in section 8 with a summary 
of our work. 



2 Mathematical Preliminaries 



Let X, Y be sets and let £ : X — > V be some function. For any j/eFwe use the 
notation (y), to denote the pre-image of y, i.e. the set {a; £ X \ (3{x) = y}. For 
any subset A C X we use the notation £(A) to denote the set {y £ Y\£(a) = 
y and a £ A} 

As in [17 , for any set X we use the notation A x to denote the set of all 
distributions over X, i.e. A x denotes set {/ : X -> [0, 1] | Y, x£ x f( x ) = !}■ For 
any set X, let 0^ : X — > {0} be the constant zero function over X. For any set 
X, an m-parent transmission function |14llll8j over X is an element of the set 

, m+l n 

p[]^-[0,l] Vx 1 ,...,x m £X,Y J T(x,x' 1 ,...,x' m ) = l\ 

^ 1 iEX J 

Extending the notation introduced above, we denote this set by A^. Fol- 
lowing [IT] , we use conditional probability notation in our denotation of trans- 
mission functions. Thus an m-parent transmission function T{x,x\ 1 . . . ,x m ) is 
denoted T(x\xx, . . . , x m ). 

A transmission function can be used to model the individual-level effect of 
mutation, which operates on one parent and produces one child, and indeed the 
individual-level effect of any variation operation which operates on any numbers 
of parents and produces one child. 

Our scheme for modeling EA dynamics is based on the one used in [17] . We 
model the genomic populations of an EA as distributions over the genome set. 
The population-level effect of the evolutionary operations of an EA is modeled 
by mathematical operators whose inputs and outputs are such distributions. 

The expectation operator, defined below, is used in the definition of the 
selection operator, which follows thereafter. 

Definition 1. (Expectation Operator) Let X be some finite set, and let 
f : X — ► R + be some function. We define the expectation operator £f : A x U 
A ' -> R+ U {0} as follows: 

The selection operator is parameterized by a fitness function. It models 
the effect of fitness proportional selection on a population of genomes. 

Definition 2. (Selection Operator) Let X be some finite set and let f : 

X — > M + be some function. We define the Selection Operator Sf : A x — > A x as 
follows: 

The population-level effect of variation is modeled by the variation oper- 
ator. This operator is parameterized by a transmission function which models 
the effect of variation at the individual level. 



Definition 3. (Variation Operatobq Let X be a countable set, and for any 
m € N + , let T G A x n be a transmission function over X. We define the variation 
operator V T : A x — ► A x as follows: 

m 

(V T p)(x) = ^2 T{x\x 1 ,...,x m )\\_p{x i ) 

e UT * 

The next definition describes the projection operator (previously used in 
[15] and [T7])- A projection operator that is parameterized by some function /3 
'projects' distributions over the domain of (3, to distributions over its co-domain. 

Definition 4. (Projection Operator) Let X be a countable set, let Y be 
some set, and let /3 : X — > Y be a function. We define the projection operator, 
: A x — » A Y as follows: 

(S (i p )(y) = P( x ) 
and call Epp the [3-projection of p. 



3 Formalization of a Class of Coarse-Grainings 

The following definition introduces some convenient function-related terminol- 
ogy 

Definition 5. (Partitioning, Theme Set, Themes, Theme Class) Let X, 
K be sets and let (3 : X — > K be a surjective function. We call (3 a partitioning, 
call the co-domain K of [3 the theme set of (3, call any element in K a theme 
of f3, and call the pre-image (k)a of some k £ K , the theme class of k under (3. 

The next definition formalizes a class of coarse-grainings in which the 
macroscopic and microscopic state variables always sum to 1. 

Definition 6 (Semi-Coarsenablity, Coarsenablity, Global Coarsen- 
ablity). Let G,K be sets, let W : A G — > A G be an operator, let (3 : G — * K 
be a partitioning, and let U C A G such that Sp(U) — A K . We say that W is 
semi-coarsenable under (3 on U if there exists an operator Q : A K — + A such 
that for all p € U, Qo EEpp — Sp o Wp, i.e. the following diagram commutes: 




2 also called the Mixing Operator in [19] and [17] 



Since (3 is surjective, if Q exists, it is clearly unique; we call it the quotient. We 
call G, K, W, and U the domain, co-domain, primary operator and turf respec- 
tively. If in addition W(U) C U we say that W is coarsenable under (3 on U . If 
in addition U = A G we say that VV is globally coarsenable under (3. 

Note that the partition function Sp of the coarse-graining is not the same as the 
partitioning (3 of the coarsening. 

Global coarsenablity is a stricter condition than coarsenablity, which in 
turn is a stricter condition than semi-coarsenablity. It is easily shown that global 
coarsenablity is equivalent to Vose's notion of compatibility [T^J p. 188] (for a 
proof see Theorem 17.5 in [TH]). 

If some operator W is coarsenable under some function (3 on some turf U 
with some quotient Q, then for any distribution p K £ 5p(U), and all distribu- 
tions p G € (Pk) b ' one can study the projected effect of the repeated application 
of W to p G simply by studying the effect of the repeated application of Q to 
p K . If the size of K is small then a computational study of the projected effect 
of the repeated application of W to distributions in U becomes feasible. 

4 Global Coarsenablity of Variation 

We show that some variation operator Vt is globally coarsenable under some 
partitioning if a relationship, that we call ambivalence, exists between the trans- 
mission function T of the variation operator and the partitioning. 

To illustrate the idea of ambivalence consider a partitioning (3 which parti- 
tions a genome set G into three subsets. Fig 1 depicts the behavior of a two-parent 
transmission function that is ambivalent under (3. Given two parents and some 
child, the probability that the child will belong to some theme class depends 
only on the theme classes of the parents and not on the specific parent genomes. 
Hence the name 'ambivalent' — it captures the sense that when viewed from 
the coarse-grained level of the theme classes, a transmission function 'does not 
care' about the specific genomes of the parents or the child. 

The definition of ambivalence that follows is equivalent to but more useful 
than the definition given in 4 

Definition 7. (Ambivalence) Let G,K be countable sets, let T e Af n be a 

transmission function, and let [3 : G — ► K be a partitioning. We say that T is 
ambivalent under (3 if there exists some transmission function D 6 A^, such 
that for all k,ki, . . . , k m G K and for any x\ € (fci)^, • ■ ■ , x m € (km}^, 

^ T(x\xi, . . .,x m ) = D(k\ki, k m ) 

If such a D exits, it is clearly unique. We denote it by T@ and call it the theme 
transmission function. 



Fig. 1. small Let (3 : G — > if be a coarse-graining which partitions the genome set 
G into three theme classes. This figure depicts the behavior of a two-parent varia- 
tion operator that is ambivalent under (3. The small dots denote specific genomes 
and the solid unlabeled arrows denote the recombination of these genomes. A 
dashed arrow denotes that a child from a recombination may be produced 'some- 
where' within the theme class that it points to, and the label of a dashed arrow 
denotes the probability with which this might occur. As the diagram shows the 
probability that the child of a variation operation will belong to a particular 
theme class depends only on the theme classes of the parents and not on their 
specific genomes 



Suppose T G is ambivalent under some (3 : X — > K, we can use the 
projection operator to express the projection of T under (3 as follows: for all 
k,h,...,k m £ K, and any x x E (kx) p , . . . , x m € (k m ) p , T 13 (k\k\, . . . k m ) is 
given by (S - ( g(T(- \x±, . . . , x m )))(k). The notion of ambivalence is equivalent to a 
generalization of Toussaint's notion of trivial neutrality [T71 p. 26]. A one-parent 
transmission function is ambivalent under a mapping to the set of phcnotypes if 
and only if it is trivially neutral. 

The following theorem shows that a variation operator is globally coarsen- 
able under some partitioning if it is parameterized by a transmission function 
which is ambivalent under that partitioning. The method by which we prove this 
theorem extends the method used in the proof of Theorem 1.2.2 in [T7] , 

Theorem 1 (Global Coarsenablity of Variation). Let G and K be count- 
able sets, let T € A^ be a transmission function and let (3 : G — * K be some 
partitioning such that T is ambivalent under (3. Then Vt '■ A G — > A G is globally 
coarsenable under (3 with quotient V T ~$ , i-e. the following diagram commutes: 




For any p G 
> o V T p)(k) 

m 

Y Y T(x\x ll ...,x m )\\_p{x l ) 

xe{k) g (x 1 ,...,x m ) i=l 

777 

en* 

1 

m 

Y Y T{x\x 1 ,...,x m )^\_p{x i ) 

(xi,...,x m ) xE{k)g i=l 

777 

en* 

1 

m 

y y T {A x u---,x m ) 

(xi,...,x m ) i=l xe(k) 

ef\x 

1 

m 

e e y T (x\x 11 ...,x m ) 

(fei,...,fe m ) (xi,...,x m )»=i xe(fe> |3 

1 3 = 1 

m 

e e n^) T ^( fc i fci '---' fc ™) 

(fci,...,fc m ) (xi,...,a; m ) z=l 

tti m. 

en^ enn 



3 = 1 



J2 T?(k\k lt ...,k m ) Yl f[p( x i) 

(fci,...,fc m ) (xi,...,x m ) *=1 

771 7TI 

1 3 = 1 

^ T^(k\k l7 ...,k m ) Y ■■■ Y p(xi).-.p(x m ) 

(fei,...,fe m ) xie<fei> /3 x m e(k m ) /3 

eUK 

1 

^ ^(fcifc!,...,^)^ X! E p(^) 

(fei,...,fe m ) ^me(fci) ' Xme(fcm) 

771 

en * 



m / 

^ T%|fc 1) ...,fc ro )m(~ /J p)(fc i ) 
(fei,...,fe m ) »=i ^ 

efix 

1 

(V t7 o ~ pP )(k) □ 



The implicit parallelism theorem in [20 is similar to the theorem above. 
Note however that the former theorem only shows that variation is globally 
coarsenable if firstly, the genome set consists of "fixed length strings, where the 
size of the alphabet can vary from position to position" , secondly the partition 
over the genome set is a schema partition, and thirdly variation is 'structural' 
(see (20] for details). The global coarsenablity of variation theorem has none of 
these specific requirements. Instead it is premised on the existence of an abstract 
relationship - ambivalence - between the variation operation and a partition- 
ing. The abstract nature of this relationship makes this theorem applicable to 
evolutionary algorithms other than GAs. In addition this theorem illuminates 
the essential relationship between 'structural' variation and schemata which was 
used (implicitly) in the proof of the implicit parallelism theorem. 

In ^ it is shown that a variation operator that models any combination 
of variation operations that are commonly used in GAs — i.e. any combination 
of mask based crossover and 'canonical' mutation, in any order — is ambivalent 
under any partitioning that maps bitstrings to schemata (such a partitioning is 
called a schema partitioning). Therefore 'common' variation in IPGAs is globally 
coarsenable under any schema partitioning. This is precisely the result of the 
implicit parallelism theorem. 

5 Limitwise Semi-Coarsenablity of Selection 

For some fitness function / : G — > R + and some partitioning (3 : G — > K let us say 
that / is thematically invariant under (3 if, for any schema k G K, the genomes 
that belong to (k)p all have the same fitness. Paraphrasing the discussion in 
[20 using the terminology developed in this paper, Wright et. al. argue that if 
the selection operator is globally coarsenable under some schema partitioning 
(3 : G — ► K then the fitness function that parameterizes the selection operator is 
'schematically' invariant under f3. It is relatively simple to use contradiction to 
prove a generalization of this statement for arbitrary partitionings. 

Schematic invariance is a very strict condition for a fitness function. An 
IPGA whose fitness function meets this condition is unlikely to yield any sub- 
stantive information about the dynamics of real world GAs. 

As stated above, the selection operator is not globally coarsenable unless 
the fitness function satisfies thematic invariance, however if the set of distribu- 
tions that selection operates over (i.e. the turf) is appropriately constrained, 
then, as we show in this section, the selection operator is semi-coarsenable over 
the turf even when the fitness function only satisfies a much weaker condition 
called thematic mean invariance. 

For any partitioning (3 : G — > K, any theme k, and any distribution p S A G . 
the theme conditional operator, defined below, returns a conditional distribution 
in A G that is obtained by normalizing the probability mass of the elements in 
(k) p by (S p p)(k) 



Definition 8 (Theme Conditional Operator). Let G be some countable set, 
let K be some set, and let (3 : G — > K be some function. We define the theme 
conditional operator Cp : A G x K — > A G U G as follow: For any p £ A G , and 
any k £ K, Cp(p, k) £ A G U G such that for any x £ {k)p, 

^ mx)= \^ m other W ise 

A useful property of the theme conditional operator is that it can be 
composed with the expected fitness operator to give an operator that returns 
the average fitness of the genomes in some theme class. To be precise, given 
some finite genome set G, some partitioning (3 : G — > K, some fitness function 
/ : G — > R + , some distribution p £ A G , and some theme k £ K, £$■ o Cp(p, k) 
is the average fitness of the genomes in (k)p. This property proves useful in the 
following definition. 

Definition 9 (Bounded Thematic Mean Divergence, Thematic Mean 
Invariance). Let G be some finite set, let K be some set, let [3 : G — > K be a 

partitioning, let f : G —* M + and f* : K — ► R + be functions, let U C A G , and 
let S £ Eq . We say that the thematic mean divergence of f with respect to f* 
on U under [3 is bounded by 8 if, for any p £ U and for any k £ K 

\£ f oC (p,k)-r(k)\<5 

If 6 = we say that f is thematically mean invariant with respect to f* on U 

The next definition gives us a means to measure a 'distance' between real 
valued functions over finite sets. 

Definition 10 (Manhattan Distance Between Real Valued Functions). 

Let X be a finite set then for any functions f, h of type X — > K we define the 
manhattan distance between f and h, denoted by d(f, h), as follows: 

d(f,h) = j2\m-Hx)\ 

xex 

It is easily checked that d is a metric. 

Let / : G -> R+, : G -» K and /* : K -> R+ be functions with finite 
domains, and let U £ A G . The following theorem shows that if the thematic 
mean divergence of / with respect to /* on U under (3 is bounded by some 6, 
then in the limit as 6 — > 0, Sf is semi-coarsenable under (3 on U . 

Theorem 2 (Limitwise Semi-Coarsenablity of Selection). Let G and K 

be finite sets, let (3 : G — > K be a partitioning, Let U C A G such that Sp(U) = 
A K , let f : G — > R + , f* : K — > R + be some functions such that the thematic 
mean divergence of f with respect to f* on U under (3 is bounded by S, then for 
any p £ U and any e > there exists a 5' > such that, 

5 < 5' => d{Sp o Sfp,Sf* o Spp) < e 



We depict the result of this theorem as follows: 




Proof: For any p £ U and for any k £ K, 
(S oS f p)(k) 



f(g)-p(g) 



J^E^/G^O 

E f{g).{Zp P ){k).{Cp{p,k)){g) 

= T E /(ff')-(^)(fc , )(c /3 (p,fc'))(5') 

k'eK g >e(k% 

(S/9P)(*) E /(s).(C/sCp.*))(ff) 
E (s^Kfc 7 ) E /(9')-(c /3 (p,fc'))(<7') 

fc'eit s'e(fc% 
= {5(3p)(k).£ f oCf3{p,k) 
j: (~pP G )(k').£ f oCp( P ,k>) 

= (S £foCfi ( P ,-) ° Zf3P){k) 

So we have that 

(1(3/3 o S f p, S f . o S p p) = d(S £foCl3 (p,.) o Epp, S f . o E p) 
By Lemma, [4] (in the appendix) for any e > there exists a Si > such that, 
d(£ f oCp(p, .),/*) < Si =>■ d(S £fOCl3 (p.,)(E0p),Sf*(Epp)) < e 



Now, if S < 



^,then d(£ f oCp(p,.)J*)<Si 



a 



Corollary 1. If S = 0, i.e. if f is thematically mean invariant with respect to 
f* on U , then Sf is semi-coarsenable under (3 on U with quotient Sf*, i.e. the 
following diagram commutes: 



U 



Sf 



A G 



A K = >A K 



6 Limitwise Coarsenablity of Evolution 



The two definitions below formalize the idea of an infinite population model of 
an EA, and its dynamics ^\ 

Definition 11 (Evolution Machine). An evolution machine (EM) is a tuple 
(G, T, /) where G is some set called the domain, f : G — ► R + is a junction called 
the fitness function and T € is called the transmission function. 

Definition 12 (Evolution Epoch Operator). Let E = (G,T,f) be an evolu- 
tion machine. We define the evolution epoch operator Qe ■ A G — ► A G as follows: 

Qe = V t o Sf 

For some evolution machine E, our aim is to give sufficient conditions 
under which, for any t G Z + , Q E approaches coarsenablity in the limit. The 
following definition gives us a formal way to state one of these conditions. 

Definition 13 (Non-Departure). Let E = (G, T, f) be an evolution machine, 
and let U C A G . We say that E is non-departing over U if 

V T °S f {U) c u 

Note that our definition does not require Sf(U) C U in order for E to be non- 
departing over U. 

Theorem 3 (Limitwise Coarsenablity of Evolution). Let E = (G,T,f), be 
an evolution machine such that G is finite, let (3 : G — > K be some partitioning, 
let f * : K — > M + be some function, let S S Rj, and let U C A G such that 
Sp(U) = A K . Suppose that the following statements are true: 

1. The thematic mean divergence of f with respect to f* on U under (3 is 
bounded by S 

2. T is ambivalent under (3 

3. E is non-departing over U 

Then, letting E* = (K,T " , /*) be an evolution machine, for any t G Z + and 
any p <EU , 

1. G E P g U 

2. For any e > 0, there exists 8' > such that, 

s <s' d{~ p o g E p , g%, o -pp) < e 

3 The definition of an EM given here is different from its definition in [2I3| . The 
fitness function in this definition maps genomes directly to fitness values. It therefore 
subsumes the genotype-to-phenotype and the phenotype-to-fitness functions of the 
previous definition. In previous work these two functions were always composed 
together; their subsumption within a single function increases clarity. 



We depict the result of this theorem as follows: 

0% 




Proof: We prove the theorem for any t £ ZJ. The proof is by induction on 
t. The base case, when t = Q, is trivial. For some n = Z+, let us assume the 
hypothesis for t — n. We now show that it is true for t = n + 1. For any p £ U, 
by the inductive assumption Q^p is in U . Therefore, since E is non-departing 
over U, Ge +1 p £ U . This completes the proof of the first part of the hypothesis. 
For a proof of the second part note that, 

d^o^+y s^o^p ) 

= d(Sp oV T o S f o g^p , v T ^ o Sf o o S^p ) 

= d(V T ^ o-poSfO Q^p , V T ^ o 5/. o g]|» o S^p) (by theorem [I) 

Hence, for any e > 0, by Lemma [2] there exists 5i such that 

diSp o 5/ o g£p , Sf o g n E , o Sf iP ) <6i^ diSp o g» +1 p , g n + x oS fj p)<e 

As d is a metric it satisfies the triangle inequality. Therefore we have that 

d(5f3 oSf o g^p , Sf o o S^p) < 

d(£' ( g oSjo g^p , 5/- o-po g 7 zp)+ 

d(Sf oSpo g^ P , s f * o g™» o -pp) 

By our inductive assumption g^p £ U . So, by theorem [2] there exists a S2 such 
that 

<5 < 5 2 d(~ p o S f o g>, 5 r o Eg o g£p) < ^ 
By lemma [3] there exists a ($3 such that 

o g n E p , gg, o e ) <s 3 ^ d(s f , o E/3 o g«p , 5/. o gg, o »>p) < | 

By our inductive assumption, there exists a 64 such that 

6 < <5 4 => o g™p , gg, o Spp) < S 3 

Therefore, letting 5' — min(<5 2 , 64) we get that 

5<8* ^d{Epog n + 1 p^g n + 1 oS p p)<e □ 

The limitwise coarsenability of evolution theorem is very general. As we 
have not committed ourselves to any particular genomic data-structure the 



coarse-graining result we have obtained is applicable to any IPEA provided that 
it satisfies three abstract conditions: bounded thematic mean divergence, am- 
bivalence, and non-departure. The fidelity of the coarse-graining depends on 
the the minimal bound on the thematic mean divergence. Maximum fidelity is 
achieved in the limit as this minimal bound tends to zero. 

7 Sufficient Conditions for Coarse-Graining IPGA 
Dynamics 

We now use the result in the previous section to argue that the dynamics of 
an IPGA with long genomes, uniform crossover, and fitness proportional se- 
lection can be coarse-grained with high fidelity for a relatively coarse schema 
partitioning, provided that the initial population satisfies a constraint called ap- 
proximate schematic uniformity and the fitness function satisfies a constraint 
called low-variance schematic fitness distribution. We stress at the outset that 
our argument is principled but informal, i.e. though the argument rests relatively 
straightforwardly on theorem 3, we do find it necessary in places to appeal to 
the reader's intuitive understanding of GA dynamics. 

For any n 6 Z + , let 33 „ be the set of all bitstrings of length n. For some 
£ 1 and some m -C £ , let (3 : 33^ — » 33 m be some schema partitioning. Let 
/* : 33 m — > R + be some function. For each k £ 33 TO , let D k £ A R be some 
distribution over the reals with low variance such that the mean of distribution 
Dk is f*(k). Let / : 33 ^ — * R + be a fitness function such that for any k 6 33 TO , 
the fitness values of the elements of (k) p are independently drawn from the 
distribution D^. For such a fitness function we say that fitness is schematically 
distributed with low-variance. 

Let U be a set of distributions such that for any k G 33 m and any p S 
U, Cp(p,k) is approximately uniform. It is easily checked that U satisfies the 
condition 5p(U) = A <Sm . We say that the distributions in U are approximately 
schematically uniform. 

Let 5 be the minimal bound such that for all p £ U and for all k £ 33 m , 
\£ f o Cp(p, k) - f*(k)\ < 5. Then, for any e > 0, P(8 < e) -> 1 as t - m -> oo. 
Because we have chosen I and m such that £ ~ m is 'large', it is reasonable to 
assume that the minimal bound on the schematic mean divergence of / on U 
under (3 is likely to be 'low'. 

Let T € A® 1 be a transmission function that models the application of 
uniform crossover. In sections 6 and 7 of [3] we rigorously prove that a trans- 
mission function that models any mask based crossover operation is ambivalent 
under any schema partitioning. Uniform crossover is mask based, and f3 is a 
schema partitioning, therefore T is ambivalent under (3. 

Let pi € A® 1 be such that pi(0) = \ and = \- For any p e U, 

Sfp may be 'outside' U because there may be one or more k G 33 TO such that 
Cp(Sfp, k) is not quite uniform. Recall that for any k £ 33 m the variance of D/~ is 
low. Therefore even though Sjp may be 'outside' U, the deviation from schematic 
uniformity is not likely to be large. Furthermore, given the low variance of Dk, 



the marginal distributions of CpiSfp^k) will be very close to pi. Given these 
facts and our choice of transmission function, for all k G K, Cfj{Vr ° Sfp, k) will 
be more uniform than Cp^Sjp, k), and we can assume that Vt Sfp is in U . In 
other words, we can assume that E is non-departing over U . 

Let E = («B^,T,/) and E* = (Q3 TO ,T^,/*) be evolution machines. By 
the discussion above and the limitwise coarsenablity of evolution theorem one 
can expect that for any approximately thcmatically uniform distribution p G 
U (including of course the uniform distribution over S^), the dynamics of E* 
when initialized with Epp will approximate the projected dynamics of E when 
initialized with p. As the bound 5 is 'low', the fidelity of the approximation will 
be 'high'. 

Note that the constraint that fitness be low-variance schematically dis- 
tributed, which is required for this coarse-graining, is much weaker than the 
very strong constraint of schematic fitness invariance (all genomes in each schema 
must have the same value) which is required to coarse-grain IPGA dynamics in 

8 Conclusion 

It is commonly assumed that the ability to track the frequencies of schemata 
in an evolving infinite population across multiple generations under different 
fitness functions will lead to better theories of adaptation for the simple GA. 
Unfortunately tracking the frequencies of schemata in the naive way described 
in the introduction is computationally intractable for IPGAs with long genomes. 
A previous coarse-graining result |20j suggests that tracking the frequencies of a 
family of low order schemata is computationally feasible, regardless of the length 
of the genomes, if fitness is schematically invariant (with respect to the family of 
schemata). Unfortunately this strong constraint on the fitness function renders 
this result useless if one's goal is to understand how GAs perform adaptation on 
real- world fitness functions. 

In this paper we developed a simple yet powerful abstract framework for 
modeling evolutionary dynamics. We used this framework to show that the dy- 
namics of an IPEA can be coarse-grained if it satisfies three abstract conditions. 
We then used this result to argue that the evolutionary dynamics of an IPGA 
with fitness proportional selection and uniform crossover can be coarse-grained 
(with high fidelity) under a relatively coarse schema partitioning if the initial 
distribution satisfies a constraint called approximate schematic uniformity (a 
very reasonable condition), and fitness is low- variance schematically distributed. 
The latter condition is much weaker than the schematic invariance constraint 
previously required to coarse-grain sclccto-mutato-recombinative evolutionary 
dynamics. 
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Appendix 

Lemma 1. For any finite set X, and any metric space (T,d),let A : T — > A and 
let B : X —* [T — > [0, 1]] 6e /unc^on^] suc/i f/iaf /or an?/ h £ T , and any x G X, 
= (^4(/l))(x). -For ant/ x £ X , and for any h* G T, «/ t/ie following statement 

is true 

Vx G X,Ve x > 0,38 x > 0,V/i£r,(i(M') < 5 X => \(B{x))(h) - (B(x))(h*)\ < e x 

Then we have that 

Ve > 0,33 > 0,V/i G T,d{h,h*) <S^ d{A{h),A{h')) <e 

This lemma says that A is continuous at h* if for all a; G X, B(x) is continuous at h* . 
PROOF: We first prove the following two claims 

Claim 1 

V.t G X s.t. (B{x)){h*) > 0,Ve x > 0,3S X > 0,Vh G T, 

d(h,h*) <8 X ^ \(B(x))(h) - {B(x))(h*)\ < e x .(B{x))(h*) 

This claim follows from the continuity of B(x) at h* for all x G X and the fact that 
(B(x))(h*) is a positive constant w.r.t. h. 

Claim 2 For all h e T 

E \(A(h*))(x)-(A(h))(x)\ = J2 \(A(h))(x) - (A(h*))(x)\ 

x£Xs.t. x£Xs.t. 

(A(h'))(x)> (A(h))(x)> 
(A(h,))(x) (A(h')Kx) 

The proof of this claim is as follows: for all h G T, 
J2(Ah*)(x))~(A(h))(x) = 

=> J] W))(*0 - CA(h))0«0 - E W))(x)-(^(h'))(x) = 

(,A(h* ))(*)> (^(h))(x)> 
(A{h))(x) (A(h*))(x) 

x£J\Ts.t. xGJfs.t. 
(A(/ 1 *))W> (.A(h))M> 



E - (A(h))(x] 



x£Xs.t. 

(A(h*))(x)> (A(h))(x)> 
(A(h)){x) (A(h*))(x) 



E (A(h))(x) - (A(h*))(a 
xeXs.t. 



E \(A(h*))(x)-(A(h))(x)\ = E U(h))(x)-(A(h*))(x) 



(.A(fc*))(*)> (^(h))(!c)> 

(A(h))(x) (A(h'))(x) 
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For any sets X, Y we use the notation [X — > V] to denote the set of all functions 
from I to 7 



We now prove the lemma. Using claim 1 and the fact that X is finite, we get that 
Ve > 0, 38 > 0, Vft € [X -> E] such that d(h, h*) <8, 

J2 \(B(x))(h*)-(B(x))(h)\< J2 |-(»(*))CO 

x^Xs.t. x^Xs.t. 

(A(h*))(x)> (A(h*))(x)> 
{A(h))(x) (A(h))(x) 

=> \(A(h*))(x)-(A(h))(x)\< 

x^Xs.t. xf^Xs.t. 

(A(h*))(x)> (A(h*))(x)> 
(A(h))(x) (A(h))(x) 



\(A(h*))(x) - (A(h))(x)\ < \ a 



xexs.t. 
(A(h*))(x)> 
(A(h))(x) 



By Claim 2 and the result above, we have that Ve > 0, 3<5 > 0, Vft G [X — > E] such 
that d(h, h*) < 8, 



J2 \(A(h))(x) - (A(h*))(x)\ < 



r 



xexs.t. 

(A(h))(x)> 
(A(h*))(x) 



Therefore, given the two previous results, we have that Ve > 0, 38 > 0, Vft € [X — > E] 
such that d(h,h*) < 8, 

-£\(A(h))(x)-(A(h*)(x))\<e a 

xex 



Lemma 2. Let X be a finite set, and let T G be a transmission function. Then 
for any p G A x and any e > 0, there exists a 8 > such that for any p G A x , 

d{p , p') < 5 d{V T p , Vtp) < e 

Sketch of Proof: Let A : A x -» A x be defined such that (A(p))(x) = (V T p)(x). Let 
B : X -> [yl x -> [0,1]] be defined such that (B(a;))(p) = (V T p)(x). The reader can 
check that for any x G X, B (x) is a continuous function. The application of lemma 1 
completes the proof. 

By similar arguments, we obtain the following two lemmas. 

Lemma 3. Let X be a finite set, and let f : X — > E + be a function. Then for any 
p' G A x and any e > 0, there exists a 8 > such that for any p G A x , 

d(p ,p')<6=> d(S f p , S f p) < e 

Lemma 4. Let X be a finite set, and let p G A x be a distribution. Then for any 
f € [X — » E + ], and any e > 0, there exists a 8 > such that for any f € [X — » E + ], 

d(f, /') <6=>d{S f p,Sf,p) <e 



